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METHOD FOR CREATING POLYNUCLEOTmE AND 
POLYPEPTIDE SEQUENCES 

5 

CROSS-REFF.RF.Nr HS TO RELATED APPLICATTOM.^ 
This application derives priority from USSN 60/067908, filed 
December 8, 1997, which is incorporated by reference in its entirety for all purposes. 

TECHNICAL FIF.T.n 
The invention resides in the technical field of genetics, and more 
specifically, forced molecular evolution of polynucleotides to acquire desired properties. 



15 
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BACKGROUND 

A variety of approaches, including rational design and directed evolution, 
have been used to optimize protein functions (1, 2). The choice of approach for a given 
optimization problem depends, in part, on the degree of understanding of the relationships 
between sequence, strucUire and function. Rational redesign typically requires extensive 
knowledge of a structure-function relationship. Directed evolution requires little or no 
specific knowledge about structure-function relationship; rather, the essential features is a 
means to evaluate the function to be optimized. Directed evolution involves the 
generation of libraries of mutant molecules followed by selection or screening for the 
desired function. Gene products which show improvement with respect to the desired 
property or set of properties are identified by selection or screening. The gene(s) 
encoding those products can be subjected to fijrther cycles of the process in order to 

accumulate beneficial mutations. This evolution can involve few or many generations,- 

depending on how far one wishes to progress and the effects of mutations typically 
observed in each generation. Such approaches have been used to create novel functional 
nucleic acids (3, 4). peptides and other small molecules (3), antibodies (3), as well as 
enzymes and other proteins (5, 6, 7). These procedures are fairly tolerant to inaccuracies 
and noise in the function evaluation (7). 
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Several publications have discussed the role of gene recombination in 
directed evolution (see WO 97/07205, WO 98/42727, US 5807723, US 5,721,367, 
US 5,776,744 and WO 98/41645 US 5.811,238, WO 98/41622, WO 98/41623. and 
US 5.093.257). 

5 A PCR-based group of recombination methods consists of DNA shuffling 

[5, 6], staggered extension process [89, 90] and random-priming recombination [87]. 
Such methods typically involve synthesis of significant amounts of DNA during 
assembly/recombination step and subsequent amplification of the final products and the 
efficiency of amplification decreases with gene size increase. 

10 Yeast cells, which possess an active system for homologous 

recombination, have been used for in vivo recombination. Cells transformed with a 
vector and partially overlapping inserts efficiently join the inserts together in the regions 
of homology and restore a functional, covalently-closed plasmid [91]. This method does 
not require PCR amplification at any stage of recombination and therefore is free from the 

1 5 size considerations inherent in this method. However, the number of crossovers 

introduced in one recombination event is limited by the efficiency of transformation of 
one cell with multiple inserts. Other in vivo recombination methods entail recombination 
between two parental genes cloned on the same plasmid in a tandem orientation. One 
method relies on homologous recombination machinery of bacterial cells to produce 

20 chimeric genes [92]. A first gene in the tandem provides the N-terminal part of the target 
protein, and a second provides the C-terminal part. However, only one crossover can be 
generated by this approach. Another in vivo recombination method uses the same tandem 
organization of substrates in a vector [93]. Before transformation into E, coli cells, 
plasmids are linearized by endonuclease digestion between the parental sequences. 

25 Recombination is performed in vivo by the en2ymes responsible for double-strand break 
repair. The ends of linear molecules are degraded by a 5'->3' exonuclease activity, 
followed.by annealing,of complementary singlcTStrand 3' ends.andjrestoratipn of A 
double-strand plasmid [94]. This method has similar advantages and disadvantages of 
tandem recombination on circular plasmid. 

30 

SUMMARY OF THE INVENTION 
The invention provides methods for evolving a polynucleotide toward 
acquisition of a desired propeny. Such methods entail incubating a population of 
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parenial polynucleotide variants under conditions to generate annealed polynucleotides 
comprises heteroduplexes. The heteroduplexes are then exposed to a cellular DNA 
repair system to convert the heteroduplexes to parental polynucleotide variants or 
recombined polynucleotide variants. The resulting polynucleotides are then screened or 
5 selected for the desired property. 

In some methods, the heteroduplexes are exposed to a DNA repair system 
in vitro. A suitable repair system can be prepared in the form of cellular extracts. 

In other methods, the products of annealing including heteroduplexes are 
introduced into host cells. The heteroduplexes are thus exposed to the host cells' DNA 

1 0 repair system in vivo. 

In several methods, the introduction of annealed products into host cells 
selects for heteroduplexes relative to transformed cells comprising homoduplexes. Such 
can be achieved, for example, by providing a first polynucleotide variant as a component 
of a first vector, and a second polynucleotide variant is provided as a component of a 

15 second vector. The first and second vectors are converted to linearized forms in which 
the first and second polynucleotide variants occur at opposite ends. In the incubating 
step, single-stranded forms of the first linearized vector reanneal with each other to form 
linear first vector, single-stranded forms of the second linearized vector reanneal with 
each other to form linear second vector, and single-stranded linearized fomis of the first 

20 and second vectors anneal with each to form a circular heteroduplex bearing a nick in 
each strand. Introduction of the products into cells thus selects for cirular heteroduplexes 
relative to the linear first and second vector. Optionally, in the above methods, the first 
and second vectors can be converted to linearized forms by PGR. Alternatively, the first 
and second vectors can be converted to linearized forms by digestion with first and 

25 second restriction enzymes. 

In some methods, polynucleotide variants are provided in double stranded 
form and are converted to single stranded form before the annealing step. Optionally, 
such conversion is by conducting asynunetric amplification of the first and second 
double stranded polynucleotide variants to amplify a first strand of the first 

30 polynucleotide variant and a second strand of the second polynucleotide variant. The 
first and second strands anneal in the incubating step to form a heteroduplex. 

In some methods, a population of polynucleotides comprising first and 
second polynucleotides is provided in double stranded form, and the method further 



wo 99/29902 



PCT/US98/25698 



-4- 

comprises incorporating the first and second polynucleotides as components of first and 
second vectors, whereby the first and second polynucleotides occupy opposite ends of 
the first and second vectors. In the incubating step single-stranded forms of the first 
linearized vector reanneal with each other to form linear first vector, single-stranded 
5 forms of the second linearized vector reanneal with each other to form linear second 
vector, and single-stranded linearized forms of the first and second vectors anneal with 
each to form a circular heteroduplex bearing a nick in each strand. In the introducing 
step selects for transformed cells comprises the circular heteroduplexes relative to the 
linear first and second vector. 

some methods, the first and second polynucleotides are obtained from 
chromosomal DNA. In some methods, the polynucleotide variants encode variants of a 
polypeptide. In some methods, the population of polynucleotide variants comprises at 
least 20 variants. In some methods, the population of polynucleotide variants are at least 
lOkbin length. 

In some methods, the polynucleotide variants comprises natural variants. 
In other methods, the polynucleotide variants comprise variants generated by mutagenic 
PGR or cassette mutagenesis. In some methods, the host cells into which heteroduplexes 
are introduced are bacterial cells. In some methods, the population of variant 
polynucleotide variants comprises at least 5 polynucleotides having at least 90% sequence 
20 identity with one another. 

Some methods further comprise a step of at least partially demethylating 
variant polynucleotides. Demethylation can be achieved by PGR amplification or by 
passaging variants through methylation-deficient host cells. 

Somie methods include a further step of sealing one or more nicks in 
25 heteroduplex molecules before exposing the heteroduplexes to a DNA repair system. 
Nicks can be sealed by treatment with DNA ligase. 

Some methods further comprise a step of isolating a screened recombinant 
polynucleotide ariant. In some methods, the polynucleotide variant is screened to 
produce a recombinant protein or a secondary metabolite whose production is catalyzed 
30 thereby. 

In some methods,^ the recombinant protein or secondary metabolite is 
formulated with a carrier to form a pharmaceutical composition. 
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In some methods, the polynucleotide variants encode enzymes selected 
from the group consisting of proteases, lipases, amylases, cutinases, cellulases, amylases, 
oxidases, peroxidases and phytases. In other methods, the polynucleotide variants encode 
a polypeptide selected from the group consisting of insulin, ACTH, glucagon, 
5 somatostatin, somatotropin, thymosin, parathyroid hormone, pigmentary hormones, 
somatomedin, erthropoietin, luteinizing hormone, chorionic gonadotropin, hyperthalmic 
releasing factors, antidiuretic hormones, thyroid stimulating hormone, relaxin, interferon, 
thrombopoietic (TPO), and prolactin. 

In some methods, each polynucleotide m the population of variant 
10 polynucleotides encodes a plurality of enzymes forming a metabolic pathway. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 illustrates the process of heteroduplex formation using 
polymerase chain reaction (PGR) with one set of primers for each different sequence to 
1 5 amplify the target sequence and vector. 

Figure 2 illustrates the process of heteroduplex formation using restriction 
enzymes to linearize the target sequences and vector. 

Figure 3 illustrates a process of heteroduplex formation using asymmetric 
or single primer polymerase chain reaction (PGR) with one set of primers for each 
20 different sequence to amplify the target sequence and vector. 

Figure 4 illustrates heteroduplex recombination using unique restriction 
enzymes (X and Y) to remove the homoduplexes. 

Figure 5 shows the amino acid sequences of the FlaA from R. lupini (SEQ 
ID NO: 1) and iJ. meliloti (SEQ ID NO:2). 
25 Figures 6A and 6B show the locations of the unique restriction sites 

utilized to linearize pRL20 and pRM40. 

- ,„ JFigiu-es 7A, .B, C and D show the DNA sequences of four mosaic j^ai4 

genes created by in vitro heteroduplex formation followed by in vivo repair ((a) is SEQ 
ID N0:3, (b) is SEQ ID N0:4, (c) is SEQ ID N0:5 and (d) is SEQ ID N0:6). 
30 Figure 8 illustrates how the heteroduplex repair process created mosaic 

floA genes containing sequence information from both parent genes. 
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Figure 9 shows the physical maps of Actinoplanes utahensis ECB 
deacylase mutants with enhanced specific activity ((a) is pM7-2 for Mutant 7-2, and (b) is 
pM16for Mutant 16). 

Figure 10 illustrates the process used for Example 2 to recombine 
5 mutations in Mutant 7-2 and Mutant 1 6 to yield ECB deacylase recombinant with more 
enhanced specific activity. 

Figure 1 1 shows specific activities of wild-type ECB deacylase and 
improved mutants Mutant 7-2, Mutant 16 and recombined Mutant 15. 

Figure 12 shows positions of DNA base changes and amino acid 
1 0 substitutions in recombined ECB deacylase Mutant 1 5 widi respect to parental sequences 
of Mutant 7-2 and Mutant 16. 

Figures 13 A, B, C, D and E show the DNA sequence of AMtahensis ECB 
deacylase gene mutant M-15 genes created by in vitro heteroduplex formation followed 
by in vivo repair (SEQ ID N0:7). 
1 5 Figure 14 illustrates the process used for Example 3 to recombine 

mutations in RCl and RC2 to yield thermostable subtilisin E. 

Figure 15 illustrates the sequences of RCl and RC2 and the ten clones 
picked randomly from the transformants of the reaction products of duplex formation as 
described in Example 3. The x's correspond to base positions that differ between RCl and 
20 RC2. The mutation at 995 corresponds to amino acid substitution at 1 8 1 , while that at 
1 107 corresponds to an amino acid substitution at 218 in the subtilisin protein sequence. 

Figure 16 shows the results of screening 400 clones from the library 
created by heteroduplex formation and repair for initial activity (Ai) and residual activity 
(Ar). The ratio A\/ Ar was used to estimate the enzymes' thermostability. Data from 
25 active variants are sorted and plotted in descending order. Approximately 12.9% of the 
clones exhibit a phenotype corresponding to the double mutant containing both the 
. NISID and the N218S mutations. 

DEFINITIONS 

^0 Screening is, in general, a two-step process in which one first physically 

separates the cells and then determines which cells do and do not possess a desired 
properly. Selection is a form of screening in which identification and physical separation 
are achieved simultaneously by expression of a selection marker, which, in some genetic 
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circumstances. allows cells expressing the marker to survive while other ceils die (or vice 
versa). Exemplary screening members include luciferase, Pgalactosidase and green 
fluorescent protein. Selection markers include drug and toxin resistance genes. Although 
spontaneous selection can and does occur in the course of natural evolution, in the present 
methods selection is performed by man. 

An exogenous DN A segment is one foreign (or heterologous) to the cell or 
homologous to the cell but in a position within the host ceil nucleic acid in which the 
element is not ordinarily found. Exogenous DNA segments are expressed to yield 
exogenous polypeptides. 

The term gene is used broadly to refer to any segment of DNA associated 
with a biological function. Thus, genes include coding sequences and/or the regulatory 
sequences required for their expression. Genes also include nonexpressed DNA segments 
that for example, form recognition sequences for other proteins. 

The term "wild-type" means that the nucleic acid fragment does not 
comprise any mutations. A "wild-type'* protein means that the protein will be active at a 
level of activity found in nature and typically will comprise the amino acid sequence 
found in nature. In an aspect, the term "wild type" or "parental sequence" can indicate a 
starting or reference sequence prior to a manipulation of the invention. 

"Substantially pure" means an object species is the predominant species 
present (i.e., on a molar basis it is more abundant than any other individual 
macromolecular species in the composition), and preferably a substantially purified 
fraction is a composition wherein the object species comprises at least about 50 percent 
(on a molar basis) of all macromolecular species present. Generally, a substantially pure 
composition will comprise more than about 80 to 90 percent of all macromolecular 
species present in the composition. Most preferably, the object species is purified to 
essential homogeneity (contaminant species cannot be detected in the composition by 
conventional detection methods) wherein the composition consists essentially of a single 
macromolecular species. Solvent species, small molecules (<500 Daltons). and elemental 
ion species are not considered macromolecular species. 

Percentage sequence identity is calculated by comparing two optimally 
aligned sequences over the window of comparison, determining the number of positions 
at which the identical nucleic acid base occurs in both sequences to yield the number of 
matched positions, dividing the number of matched positions by the total number of 
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positions in the window of comparison. Optimal alignment of sequences for aligning a 
comparison window can be conducted by computerized implementations of algorithms 
GAP, BESTFIT, FASTA, and TFASTA in die Wisconsin Genetics Software Package 
Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, WI. 
5 The temi naturally-occurring is used to describe an object that can be 

found in nature as distinct from being artificially produced by man. For example, a 
polypeptide or polynucleotide sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not been intentionally modified 
by man in the laboratory is naturally-occurring. Generally, the term naturally-occurring 

1 0 refers to an object as present in a non-pathological (undiseased) individual, such as would 
be typical for the species. 

A nucleic acid is operably linked when it is placed into a functional 
relationship with another nucleic acid sequence. For instance, a promoter or enhancer is 
operably linked to a coding sequence if it increases the transcription of the coding 

1 5 sequence. Operably linked means that the DNA sequences being linked are typically 
contiguous and, where necessary to join two protein coding regions, contiguous and in 
reading frame. However, since enhancers generally function when separated from the 
promoter by several kilobases and intronic sequences may be of variable lengths, some 
polynucleotide elements may be operably linked but not contiguous. 

20 A specific binding affinity between, for example, a iigand and a receptor, 

means a binding affinity of at least 1 x 10^ M''. 

The term "cognate" as used herein refers to a gene sequence that is 
evolutionarily and functionally related between species. For example but not limitation, 
in the human genome, the human CD4 gene is the cognate gene to the mouse CD4 gene, 

25 since the sequences and structures of these two genes indicate that they arc highly 
homologous and both genes encode a protein which fimciions in signaling T cell 
activation through MHC class Il-restricted antigen recognition. 

The term "heteroduplex" refers to hybrid DNA generated by base pairing 
between complementary single strands derived from the different parental duplex 

30 molecules^ whereas the term ''homoduplex" refers to double-stranded DNA generated by 
base pairing between complementary single strands derived from the same parental 
duplex molecules. 
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The term "nick'* in duplex DNA refers to the absence of a phosphodiester 
bond between two adjacent nucleotides on one strand. The term "gap" in duplex DNA 
refers to an absence of one or more nucleotides in one strand of the duplex. The term 
"loop'* in duplex DNA refers to one or more unpaired nucleotides in one strand. 
5 A mutant or variant sequence is a sequence showing substantial variation 

from a wild type or reference sequence that differs from the wild type or reference 
sequence at one or more positions. 

DETAILED DESCRIPTION 

10 

I. General 

The invention provides methods of evolving a polynucleotide toward 
acquisition of a desired property . The substrates for the method are a population of at 
least two polynucleotide variant sequences that contain regions of similarity with each 

1 5 other but, which also have point(s) or regions of divergence. The substrates are annealed 
in vitro at the regions of similarity. Annealing can regenerate initial substrates or can 
form heteroduplexes, in which component strands originate from different parents. The 
products of annealing are exposed to enzymes of a DNA repair, and optionally a 
replication system, that repairs unmatched pairings. Exposure can be in vivo as when 

20 annealed products are transformed into host cells and exposed to the hosts DNA repair 
system. Alternatively, exposure can be in vitro, as when annealed products are exposed 
to cellular extracts containing functional DNA repair systems. Exposure of 
heteroduplexes to a DNA repair system results in DNA repair at bulges in the 
heteroduplexes due to DNA mismatching. The repair process diflFers from homologous 

25 recombination in promoting nonreciprocal exchange of diversity between strands. The 
DNA repair process is typically effected on both component strands of a heteroduplex 
molecule and at any particular mismatch is typically random as to which strand is 
repaired. The resulting population can thus contain recombinant polynucleotides 
encompassing an essentially random reassortment of points of divergence between 

30 parental strands. The population of recombinant polynucleotides is then screened for 
acquisition of a desired property. The property can be a property of the polynucleotide 
per se. such as capacity of a DNA molecule to bind to a protein or can be a property of an 
expression product thereof, such as mRN A or a protein. 
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IL Substrates For Shuffling 

The substrates for shuffling are variants of a reference polynucleotide that 
show some region(s) of similarity with the reference and other region(s) or point(s) of 
5 divergence. Regions of similarity should be sufiFicient to support annealing of 

polynucleotides such that stable heteroduplexes can be formed. Variants forms often 
show substantial sequence identity with each other (e.g., at least 50%, 75%, 90% or 99%). 
There should be at least sufficient diversity between substrates that recombination can 
generate more diverse products than there are starting materials. Thus, there must be at 

10 least two substrates differing in at least two positions. The degree of diversity depends on 
the length of the substrate being recombined and the extent of the functional change to be 
evolved. Diversity at between 0.1-25% of positions is typical. Recombination of 
mutations from very closely related genes or even whole sections of sequences from more 
distantly related genes or sets of genes can enhance the rate of evolution and the 

1 5 acquisition of desirable new properties. Recombination to create chimeric or mosaic 
genes can be useful in order to combine desirable features of two or more parents into a 
single gene or set of genes, or to create novel functional features not found in the parents. 
The number of different substrates to be combined can vary widely in size from two to 
1 0, 1 00, 1 000, to more than 1 0^ 1 0^* or 1 0^ members. 

20 The initial small population of the specific nucleic acid sequences having 

mutations may be created by a number of different methods. Mutations may be created 
by error-prone PGR. Error-prone PGR uses low-fidelity polymerization conditions to 
introduce a low level of point mutations randomly over a long sequence. Alternatively, 
mutations can be introduced into the template polynucleotide by oligonucleotide-directed 

25 mutagenesis. In oligonucleotide-directed mutagenesis, a short sequence of the 

polynucleotide is removed from the polynucleotide using restriction enzyme digestion 
and is replaced with a synthetic polynucleotide in which various bases have been altered... 
from the original sequence. The polynucleotide sequence can also be altered by chemical 
mutagenesis. Ghemical mutagens include, for example, sodium bisulfite, nitrous acid. 

30 hydroxylamine. hydrazine or formic acid. Other agents which are analogues of 
nucleotide precursors include nitrosoguanidine. 5-bromouraciL 2-aminopurine. or 
acridine. Generally, these agents are added to the PGR reaction in place of the nucleotide 
precursor thereby mutating the sequence. Intercalating agents such as proflavine. 
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acriflavine. quinacrine and the like can also be used. Random mutagenesis of the 
polynucleotide sequence can also be achieved by irradiation with X-rays or ultraviolet 
light. Generally, plasmid DNA or DNA fragments so mutagenized are introduced into E. 
coli and propagated as a pool or library of mutant plasmids. 
5 Alternatively the small mixed population of specific nucleic acids can be 

found in nature in the form of different alleles of the same gene or the same gene from 
different related species (i.e., cognate genes). Alternatively, substrates can be related but 
nonallelic genes, such as the immunoglobulin genes. Diversity can also be the result of 
previous recombination or shuffling. Diversity can also result from resynthesizing genes 

10 encoding natural proteins with alternative codon usage. 

The starting substrates encode variant forms of sequences to be evolved. 
In some methods, the substrates encode variant forms of a protein for which evolution of 
a new or modified property is desired. In other methods, the substrates can encode 
variant forms of a plurality of genes constituting a multigene pathway. In such methods, 

1 5 variation can occur in one or any number of the component genes. In other methods, 
substrates can contain variants segments to be evolved as DNA or RNA binding 
sequences. In methods, in which starting substrates containing coding sequences, any 
essential regulatory sequences, such as a promoter and polyadenylation sequence, 
required for expression may also be present as a component of the substrate. 

20 Alternatively, such regulatory sequences can be provided as components of vectors used 
for cloning the substrates. 

The starting substrates can vary in length from about 50, 250, 1000, 
10,000, 100,000, 10^ or more bases. The starting substrates can be provided in double- or 
single-stranded form. The starting substrates can be DNA or RNA and analogs thereof 

25 If DNA, the starting substrates can be genomic or cDN A. If the substrates are RNA, the 
substrates are typically reverse-transcribed to cDNA before heteroduplex formation. 
"Substrates-can be provided as cloned fragments, chemically synthesizeji fi^gmems^p^^^ 
PGR amplification products. Substrates can derive from chromosomal, plasmid or viral 
sources. In some methods, substrates are provided in concatemeric form. 

30 

III. Procedures for Generating Heteroduolexes 

Heteroduplexes are generated from double stranded DNA substrates, by 
denaturing the DNA substrates and incubating under annealing conditions. Hybridization 
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conditions for heteroduplex formation are sequence-dependent and are different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures. 
Generally, hybridization conditions are selected to be about 25°C lower than the thermal 
melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm 
5 is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at 
which 50% of the probes complementary to the target sequence hybridize to the target 
sequence at equilibrium. 

Exemplary conditions for denaturation and renaturation of double stranded 
substrates are as follows. Equimolar concentrations (-1.0-5.0 nM) of the substrates are 

10 mixed in 1 x SSPE buffer (180 mM NaCl. l.OmMEDTA, 10 mMNaH2P04, pH 7.4) 
After heating at 96''C for 10 minutes, the reaction mixture is inmiediately cooled at 0°C 
for 5 minutes; The mixture is then incubated at 68°C for 2-6 hr. Denaturation and 
reannealing can also be carried out by the addition and removal of a denaturant such as 
NaOH. The process is the same for single stranded DNA substrates, except that the 

15 denatiuing step may be omitted for short sequences. 

By appropriate design of substrates for heteroduplex formation, it is 
possible to achieve selection for heteroduplexes relative to reformed parental 
homoduplexes. Homoduplexes merely reconstruct parental substrates and effectively 
dilute recombinant products in subsequent screening steps. In general, selection is 

20 achieved by designing substrates such that heteroduplexes are formed in open-circles, 

whereas homoduplexes are formed as linear molecules. A subsequent transformation step 
results in substantial enrichment (e.g., 100-fold) for the circular heteroduplexes. 

Figure 1 shows a method in which two substrate sequences in separate 
vectors are PCR-amplified using two different sets of primers (PI, P2 and P3. P4). 

25 Typically, first and second substrates are inserted into separate copies of the same vector. 
The two different pairs of primers initiate amplification at different points on the two 
vectors. Fig. 1 shows an arrangement in. which the P1/P2 primer pairs initiates 
amplification at one of the two boimdaries of the vector with the substrate and the P1/P2 
primer pair initiates replication at the other boundary in a second vector. The uvo primers 

30 in each primer pair prime amplification in opposite directions aroimd a circular plasmid. 
The amplification products generated by this amplification are double-stranded linearized 
vector molecules in which the first and second substrates occur at opposite ends of the 
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vector. The amplification products are mixed, denatured and annealed. Mixing and 
denaturation can be performed in either order. Reannealing generates two linear 
homoduplexes, and an open circular heteroduplex containing one nick in each strand, at 
the initiation point of PCR amplification. Introduction of the amplification products into 
5 host cells selects for the heteroduplexes relative to the homoduplexes because the former 
transform much more efficiendy than the latter. 

It is not essential in the above scheme that amplification is initiated at the 
interface, between substrate and the. rest of the vector. Ratiier. amplification can be 
initiated at any points on two vectors bearing substrates provided that the amplification is 

10 initiated at different points between the vectors. In the general case, such amplification 
generates two linearized vectors in which the first and second substrates respectively 
occupy different positions relative to the remainder of the vector. Denaturation and 
reannealing generator heteroduplexes similar to that shown in Fig, 1 , except that the nicks 
occur within the vector component rather than at the interface between plasmid and 

1 5 substrate. Initiation of amplification outside the substrate component of a vector has the 
advantage that it is not necessary to design primers specific for the substrate borne by the 
vector. 

Although Fig. 1 is exemplified for two substrates, the above scheme can be 
extended to any number of substrates. For example, an initial population of vector 

20 bearing substrates can be divided into two pools. One pool is PCR-amplified from one 
set of primers, and the other pool from another. The amplification products are denatured 
and annealed as before. Heteroduplexes can form containing one strand from any 
substrate in the first pool and one strand from any substrate in the second pool. 
Alternatively, three or more substrates cloned into multiple copies of a vector can be 

25 subjected to amplification with amplification in each vector starting at a different point. 
For each substrate, this process generates amplification products varying in how flanking 

vector DN A is divided on the two sides pf the substrate. For example, one amplification 

product might have most of the vector on one side of the substrate, another amplification 
product might have most of the vector on the other side of the substrate, and a fiirther 

30 amplification product might have an equal division of vector sequence flanking the 
substrate. In the subsequent annealing step, a strand of substrate can form a circular 
heteroduplex with a strand of any other substrate, but strands of the same substrate can 
only reaimeal with each other to form a linear homoduplex. In a still fiirther variation. 
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multiple substrates can be performed by performing multiple iterations of the scheme in 
Fig. 1. After the first iteration, recombinant polynucleotides in a vector, imdergo 
heteroduplex formation with a third substrate incorporated into a further copy of the 
vector. The vector bearing the recombinant polynucleotides and the vector bearing the 
5 third substrate are separately PCR amplified from different primer pairs. The 

amplification products are then denatured and annealed. The process can be repeated 
further times to allow recombination with further substrates. 

An alternative scheme for heteroduplex formation is shown in Fig. 2. 
Here, first and second substrates are incorporated into separate copies of a vector. The 

10 two copies are then respectively digested with different restriction enzymes. Fig. 2 shows 
an arrangement in which, the restriction enzymes cut at opposite boundaries between 
substrates and vector, but all that is necessary is. to use two different restriction enzymes 
that cut at different places. Digestion generates linearized first and second vector bearing 
first and second substrates, the first and second substrates occupying different positions 

15 relative to the remaining vector sequences. Denaturation and reannealing generates open 
circular heteroduplexes and linear homoduplexes. The scheme can be extended to 
recombination between more than two substrates using analogous strategies to those 
described with respect to Fig. 1 . In one variation, two pools of substrates are formed, and 
each is separately cloned into vector. The two pools are then cute with different enzymes, 

20 and annealing proceeds as for two substrates. In another variation, three or more 

substrates can be cloned into three or more copies of vector, and the three or more result 
molecules cut with three or more enzymes, cutting at three or more sites. This generates 
three different linearized vector forms differing in the division of vector sequences 
flanking the substrate moiety in the vectors. Alternatively, any number of substrates can 

25 be recombined pairwise in an iterative fashion with products of one round of 
recombination annealing with a fresh substrate in each round. 

In a further variation, heteroduplexes can be formed from substrates 
molecules in vector-free form, and the heteroduplexes subsequently cloned into vectors. 
Such can be achieved by asynunetric amplification of first and second substrates as 

30 shown in Fig, 3. Asymmetric or single primer PCR amplifies only one strand of a duplex. 
By appropriate selection of primers, opposite stands can be amplified from two different 
substrates. On reannealing amplification products, heteroduplexes are formed from 
opposite strands of the two substrates. Because only one strand is amplified from each 
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substrate, reannealing does not reform homoduplexes (other than for small quantities of 
unamplified substrate). The process can be extended to allow recombination of any 
number of substrates using analogous strategies to those described with respect to Fig. 1. 
For example, substrates can be divided into two pools, and each pool subject to the same 
5 asymmetric amplification, such that amplification products of one pool can only anneal 
with amplification products of the other pool , and not widi each other. Alternatively, 
shuffling can proceed pairwise in an iterative manner, in which recombinants formed 
from heteroduplexes of first and second substrates, are subsequently subjected to 
heteroduplex formation with a third substrate. Point mutations can also be introduced at a 

1 0 desired level during PGR amplification. 

Fig. 4 shows another approach of selecting for heteroduplexes relative to 
homoduplexes. First and second substrates are isolated by PGR amplification from 
separate vectors. The substrates are denatured and allowed to anneal forming both 
heteroduplexes and reconstructed homoduplexes. The products of annealing are digested 

15 with restriction enzymes X and Y. X has a site in the first substrate but not the second 
substrate, and vice versa for Y. Enzyme X cuts reconstructed homoduplex from the first 
substrate and enzyme Y cuts reconstructed homoduplex firom the second substrate. 
Neither enzyme cuts heteroduplexes. Heteroduplexes can effectively be separated from 
restriction firagments of homoduplexes by further cleavage with enzymes A and B having 

20 sites proximate to the ends of both the first and second substrates, and ligation of the 

products into vector having cohesive ends compatible with ends resulting from digestion 
with A and B. Only heteroduplexes cut with A and B can ligate with the vector. 
Alternatively, heteroduplexes can be separated fi-om restriction fi-agments of 
homoduplexes by size selection on gels. The above process can be generalized to N 

25 substrates by cleaving the mixture of heteroduplexes and homoduplexes with N enzymes, 
each one of which cuts a different substrate and no other substrate. Heteroduplexes can 
be formed by directional cloning. Two.siibstrates for heterodiiplex formation can be 
obtained by PGR amplification of chromosomal DNA and joined to opposite ends of a 
linear vector. Directional cloning can be achieved by digesting the vector with two 

30 different enzymes, and digesting or adapting first and second substrates to be respectively 
compatible with cohesive ends of only of the two enzymes used to cut the vector. The 
first and second substrates can thus be ligated at opposite ends of a linearized vector 
fragment. This scheme can be extended to any number of substrates by using principles 
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analogous to those described for Fig. 1. For example, substrates can be divided into two 
pools before ligation to the vector. Alternatively, recombinant products formed by 
heteroduplex formation of first and second substrates, can subsequently undergo 
heteroduplex formation with a third substrate. 

5 

IV. Vectors and Transformation 

In general, substrates are incorporated into vectors either before or after 
the heteroduplex formation step. A variety of cloning vectors typically used in genetic 
engineering are suitable. 

10 The vectors containing the DNA segments of interest can be transferred 

into the host cell by standard methods, depending on the type of cellular host. For 
example, calcium chloride transformation is commonly utilized for prokaryotic cells, 
whereas calcium phosphate treatment. Lipofection, or electroporation may be used for 
other cellular hosts. Other methods used to transform mammalian cells include the use of 

1 5 Polybrene, protoplast fusion, liposomes, electroporation, and microinjection, and 

biolisitics (see, generally, Sambrook et al., supra). Viral vectors can also be packaged in 
vitro and introduced by infection. The choice of vector depends on the host cells. In 
general, a suitable vector has an origin of replication recognized in the desired host cell, a 
selection maker capable of being expressed in the intended host cells and/or regulatory 

20 sequences to support expression of genes within substrates being shuffled. 

V. Types of Host Cells 

In general any type of cells supporting DNA repair and replication of 
heteroduplexes introduced into the cells can be used. Cells of particular interest are the 

25 standard cell types commonly used in genetic engineering, such as bacteria, particularly, 
£1 coii (16, 17). Suitable £. coli strains include £. coli mutS, mutL, dam\ and/or recA 
£.co// XL-lO-Gold {\TefA(mcrA)183 AfmcrCB-hsdSMR-mrr) 1 73 endAl,5upE44 thhl 
recAl gyrA96 relAl lac HteJ fF' proAB lacfZAMlS TnlO (Tef) Amy Cam']), E. coli 
ES1301 mw/S' [Genotype; lacZ53, mutS201::Tn5, thyA36. rha-3. metBl. deoQ INfrrnD- 

30 rrnE)] (20. 24, 28^2). Preferred E. coli strains are Ecoli SCSI 1 0 [Genotype: rpsl, 
(St/), thr leu, enda. thi-1, lacy, galk gait, ara tona, tsx, dam, dcm. supE44, Aflac- 
proAB), [F, traD36, proA^B^IacPZAMIS]. which have normal cellular mismatch repair 
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systems ( 1 7). This strain type repairs mismatches and unmatches in the heteroduplex 
with little strand-specific preference. Further, because this strain is dam and dcm\ 
plasmid isolated from the strain is unmethylated and therefore particularly amenable for 
further rounds ofDNA duplex formation/mismatch repair (see below). Other suitable 
bacterial cells include gram-negative and gram-positive, such as Bacillus, Pseudomonas, 
and Salmonella, 

Eukaryotic organisms are also able to cany out mismatch repair (43-48). 
Mismatch repair systems in both prokaryotes and eukaryotes are thought to play an 
important role in the maintenance of genetic fidelity during DNA replication. Some of 
the genes that play important roles in mismatch repair in prokaryotes, particularly mutS 
and mutL. have homologs in eukaryotes. in the outcome of genetic recombinations, and in 
genome stability. Wild-type or mutant S. cerevisiae has been shown to carry out 
mismatch repair of heteroduplexes (49-56), as have COS-1 monkey cells (57). Preferred 
strains of yeast are Picchia and Saccharomyces. Mammalian cells have been shown to 
have the capacity to repair G-T to G-C base pairs by a short-patch mechanism (38, 58- 
63). Mammalian cells (e.g., mouse, hamster, primate, human), both cell lines and primary 
cultures can also be used. Such cells include stem cells, including embryonic stem cells, 
zygotes, fibroblasts, lymphocytes, Chinese hamster ovary (CHO), mouse fibroblasts 
(NIH3T3), kidney, liver, muscle, and skin cells. Other eucaryotic cells of interest include 
plant cells, such as maize, rice, wheat, cotton, soybean, sugarcane, tobacco, and 
arabidopsis: fish, algae, fungi (aspergillus, podospora, neurospora), insect (e.g., baculo 
lepidoptera) (see, Winnacker, "From Genes to Clones," VCH Publishers. N.Y., (1987), 
which is incorporated herein by reference). 

In vivo repair occurs in a wide variety of prokaryotic and eukaryotic cells. 
Use of mammalian cells is advantage in certain application in which substrates encode 
polypeptides that are expressed only in mammalian cells or which are intended for use in 
manunalian cells. However, bacterial and yeast cells are advantageous for screening 
large libraries due to the higher transformation frequencies attainable in these strains. 

V. In Vitro DNA Repair Svstems 

As an alternative to introducing annealed products into host cells, aimealed 
products can be exposed a DNA repair system in vitro. The DNA repair system can be 
obtained as extracts from repair-competent £. coli. yeast or any other cells (64-67). 
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Repair-competent cells are lysed in appropriate buffer and supplemented with 
nucleotides. DN A is incubated in this cell extract and transformed into competent cells 
for replication. 

5 VI. Screening and Selection 

After introduction of annealed products into host cells, the host cells are 
typically cultured to allow repair and replication to occur and optionally, for genes 
encoded by polynucleotides to be expressed. The recombinant polynucleotides can be 
subject to further rounds of recombination using the heteroduplex procedures described 

10 above, or other shuffling methods described below. However, whether after one cycle of 
recombination or several, recombinant polynucleotides are subjected to screening or 
selection for a desired property. In some instances, screening or selection in performed in 
the same host cells that are used for DNA repair. In other instances, recombinant 
polynucleotides, their expression products or secondary metabolites produced by the 

15 expression products are isolated from such cells and screened in vitro. In other instances, 
recombinant polynucleotides are isolated from the host cells in which recombination 
occurs and are screened or selected in other host cells. For example, in some methods, it 
is advantageous to allow DNA repair to occur in a bacterial host strain, but to screen an 
expression product of recombinant polynucleotides in eucaryotic cells. The recombinant 

20 polynucleotides surviving screening or selection are sometimes useful products in 

themselves. In other instances, such recombinant polynucleotides are subjected to further 
recombination with each other or other substrates. Such recombination can be effected by 
the heteroduplex methods described above or any other shuffling methods. Further 
roimd(s) of recombination are followed by further roimds of screening or selection on an 

25 iterative basis. Optionally, the stringency of selection can be increased at each roimd. 

The nature of screening or selection depends on the desired property 
sought to be acquired. Desirable properties of enzymes include high catalytic activity, 
capacity to confer resistance to drugs, high stability, the ability to accept a wider (or 
narrower) range of substrates, or the ability to function in nonnatural environments such 

30 as organic solvents. Other desirable properties of proteins include capacity to bind a 
selected target, secretion capacity, capacity to generate an immime response to a given 
target, lack of immunocenicity and toxicity to pathogenic microorganisms. Desirable 
properties of DNA or RNA polynucleotides sequences include capacity to specifically 
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bind a given protein target, and capacity to regulate expression of operably linked coding 
sequences. Some of the above properties, such as drug resistance, can be selected by 
plating cells on the drug. Other properties, such as the influence of a regulatory sequence 
on expression, can be screened by detecting appearance of the expression product of a 
5 reporter gene linked to the regulatory sequence. Other properties, such as capacity of an 
expressed protein to be secreted, can be screened by FACS™, using a labelled antibody to 
the protein. Other propenies, such as immunogenicity or lack thereof, can be screened by 
isolating protein from individual cells or pools of cells, and analyzing the protein in vitro 
or in a laboratory animal. 

10 

VII. Variations 

1. Demethvlation 

Most cell types methylate DNA in some manner, with the pattern of 
methylation differing between cells types. Sites of methylation include 5-methylcytosine 

15 (m^C), N4-methylcytosine (m'^C) and N^-methyladenine (m^A), 5-hydroxymethylcytosine 
(hm^C) and 5-hydroxymethyluracil (hm^U). In E. coli, methylation is effected by Dam 
and Dcm enzymes. The methylase specified by the dam gene methylates the N6-position 
of the adenine residue in the sequence GATC, and the methylase specified by the dcm 
gene methylates the CS-position of the internal cytosine residue in the sequence 

20 CCWGG. DNA from plants and mammal is often subject to CG methylation meaning 
that CG or CNG sequences are methylated. Possible effects of methylated on cellular 
repair are discussed by references 1 8-20. 

In some methods, DNA substrates for heteroduplex formation are at least 
partially demethylated on one or both strands, preferably the latter. Demethylation of 

25 substrate DNA promotes efficient and random repair of the heteroduplexes. In 

heteroduplexes formed with one strand dam-methylated and one strand unmethylated, 
repair is biased to the unmethylated strand, with the.methy lated. strand serving as. the 
template for correction. If neither strand is methylated, mismatch repair occurrs, but 
showes insignificant strand preference (23, 24). 

30 Demethylation can be performed in a variety of ways. In some methods, 

substrate DNA is demethylated by PCR-amplification. In some instances. DNA 
demethylation is accomplished in one of the PGR steps in the heteroduplex formation 
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procedures described above. In other methods, an additional PGR step is performed to 
effect demethylation. In other methods, demethylation is effected by passaging substrate 
DNA through methylation deficient host cells (e.g. an £ coli damdcm strain). In other 
methods, substrate DNA is demethylated in vitro using a demethylating enzymes. 
Demethyiated DNA is used for heteroduplex formation using the same procedures 
described above. Heteroduplexes are subsequently introduced into DNA-repair-proficient 
but restriction-enzyme-defective cells to prevent degradation of the unmethylated 
heteroduplexes. 

2. Sealing Nicks 

Several of the methods for heteroduplex formation described above result 
in circular heteroduplexes bearing nicks in each strand. These nicks can be sealed before 
introducing heteroduplexes into host cells. Sealing can be effected by treatment with 
DNA ligase under standard ligating conditions. Ligation fomis a phosphodiester bond to 
link two adjacent bases separated by a nick in one strand of double helix of DNA. 
Sealing of nicks increases the frequency of recombination after introduction of 
heteroduplexes into host cells. 

3. Error Prone PGR Attendant To Amplification 

Several of the formats described above include a PGR amplification step. 
Optionally, such a step can be performed under mutagenic conditions to induce additional 
diversity between substrates. 

VIII. Other Shuffling Methods 

The methods of heteroduplex formation described above can be used in 
conjunction with other shuffling methods. For example, one can perform one cycle of 
heteroduplex-shuffling, screening or selection, followed by a cycle of shuffling by. another 
method, followed by a further cycle of screening or selection. Other shuffling formats are 
described by WO 95/22625: US 5,605 J93; US 5,81 1.238; WO 96/19256; Stemmer, 
Science 270. 1510 (1995); Stemmer et al.. Gene. 164. 49-53 (1995); Stemmer. 
Bio/Technology, 13. 549-553 (1995); Stemmer. Proc. Natl, Acad Sci. USA 91. 10747- 
10751 (1994): Stemmer. Nature 370, 389-391 (1994): Crameri et al.. Nature Medicine, 
2(1): 1-3. (1996): Crameri et al.. Nature Biotechnology 14. 315-319 (1996): WO 98/42727: 
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WO 98/41622; WO 98/05764 and WO 98/42728^ WO 98/27230 (each of which is 
incorporated by reference in its entirety for all purposes). 

IX. Protein Anai^p rf; 

5 Proteins isolated by the methods also serve as lead compounds for the 

development of derivative compounds. The derivative compounds can include chemical 
modifications of amino acids or replace amino acids with chemical structures. The 
analogs should have a stabilized electronic configuration and molecular conformation that 
allows key functional groups to be presented in substantially the same way as a lead 

10 protein. In particular, the non-peptic compounds have spatial electronic properties which 
are comparable to the polypeptide binding region, but will typically be much smaller 
molecules than the polypeptides, frequently having a molecular weight below about 2 
CHD and preferably below about 1 CHD. Identification of such non-peptic compounds 
can be performed through several standard methods such as self-consistent field (CSF) 

1 5 analysis, configuration interaction (CHI) analysis, and normal mode dynamics analysis. 
Computer programs for implementing these techniques are readily available. See Rein et 
al., Computer-Assisted Modeling of Receptor-Ligand Interactions (Alan Liss, New York, 
1989). 



20 IX. Pharmaceutical Compositions 

Polynucleotides, their expression products, and secondary metabolites 
whose formation is catalyzed by expression products, generated by the above methods are 
optionally formulated as pharmaceutical compositions. Such a composition comprises 
one or more active agents, and a pharmaceutically acceptable carrier. A variety of 

25 aqueous carriers can be used, e.g., water, buffered water, phosphate-buffered saline 

(PBS), 0.4% saline, 0.3% glycine, human albumin solution and the like. These solutions 
-^ sterile and generally.free of particulate matter. The compositions. may..contain 
pharmaceutically acceptable auxiliary substances as required to approximate physio- 
logical conditions such as pH adjusting and buffering agents, toxicity adjusting agents 

30 and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium 
chloride and sodium is selected primarily based on fluid volumes, viscosities, and so 
forth, in accordance with the particular mode of administration selected. 
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EXAMPLES 

EXAMPLE 1. Novel Rhizobium Flag Genes From Recombination OfRhizohium In pirti 
Flag And Rhizobium Meliloti FlaA 

Bacterial flagella have a helical filament, a proximal hook and a basal 
body with the flagellar motor (68). This basic design has been extensively examined in 
£ coli and S. typhimurium and is broadly applicable to many other bacteria as well as 
some archaea. The long helical filaments are polymers assembled firom flagellin subunits, 
whose molecular weights range between 20,000 and 65,000, depending on the bacterial 
species (69). Two types of flagellar filaments, named plain and complex, have been 
distinguished by their electron microscopically determined surface structures (70). Plain 
filaments have a smooth surface with faint helical lines, whereas complex filaments 
exhibit a conspicuous helical pattern of ahemating ridges and grooves. These 
characteristics of complex flagellar filaments are considered to be responsible for the 
brittle and (by implication) rigid structure that enables them to propel bacteria efficiently 
in viscous media (71-73). Whereas flagella with plain filaments can alternate between 
clockwise and counter clockwise rotation (68), all known flagella with complex filaments 
rotate only clockwise v^th intermittent stops (74). Since this latter navigation pattem is 
found throughout bacteria and archaea, it has been suggested that complex flagella may 
reflect the common background of an ancient, basic motility design (69). 

Differing from plain bacterial flagella in the fine structure of their 
filaments dominated by conspicuous helical bands and in their fragility, the filaments are 
also resistant against heat decomposition (72). Schmitt et al. (75) showed that 
bacteriophage 7-7-1 specifically adsorbs to the complex flagella of RJupini H 13-3 and 
requires motility for a productive infection of its host. Though the flagellins from R. 
meliloti and R. lupini are quite similar, bacteriophage 7-7-1 does not infect R.melilotL 
Until now complex flagella have been observed in only three species of soil bacteria: 
Pseudomongsrhodos (73), R, meliloti (IG), and RJupini HI 3-3 (70, 72). Cells of RJupini 
HI 3-3 posses 5 to 10 peritrichously inserted complex flagella. which were first isolated 
and analyzed by high resolution electron microscopy and by optical dif&aciion (70). 

Maruyama et al. (77) ftirther found that a higher content of hydrophobic 
amino acid residues in the complex filament may be one of the main reasons for the 
unusual properties of complex flagella. By measuring mass per unh length and obtaining 
three-dimensional reconstruction from electron micrographs. Trachtenberg et al. (73. 78) 
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suggested that the complex filaments of R. lupini are composed of functional dimers. 
Figure 6 shows the comparison between the deduced amino acid sequence of the R. lupini 
HI 3-3 FlaA and the deduced amino acid sequence of the R. meliloti FlaA. Perfect 
matches are indicated by venical lines, and conservative exchanges are indicated by 
5 colons. The overall identity is 56%. The RAupiniflaA and R.melilotiflaA were subjected 
to in vitro heteroduplex formation followed by in vivo repair in order to create novel 
FlaA molecules and structures. 

A. Methods 

J 0 pRL20 containing R, lupini H 1 3-3 floA gene and pRM40 containing 

R.melilotiflaA gene are shovm in Figs. 6A and 6B. These plasmids were isolated from £. 
coli SCSllO (free from dam- and dcm-type methylation). 

About 3.0 pg. of unmethylated pRL20 and pRM40 DNA were digested with Bam HI and 
Eco RI, respectively, at 37°C for 1 hour. After agarose gel separation, the linearized 

1 5 DNA was purified with Wizard PGR Prep kit (Promega, WI, USA). 

Equimolar concentrations (2.5 nM) of the linearized unmethylated pRL20 and pRM40 
were mixed in 1 x SSPE buffer ( 1 80 mM NaCl, 1 mM EDTA, 1 0 mM NaH2P04, pH 
7.4). After heating at 96''C for 10 minutes, the reaction mixture was immediately cooled 
at 0°C for 5 minutes. The mixture was incubated at 68°C for 2 hour for heteroduplexes to 

20 form. 

One microliter of the reaction mixture was used to transform 50 \i\ off. 
coli ES 1301 mutS, £. coli SCS 1 10 and £. coli JM109 competent cells. The 
transformation efficiency with £. coli JM109 competent cells was about seven times 
higher than that of £. coli SCSI 10 and ten times higher than that off. coli ESI 301 mutS, 
25 although the overall uransformation efficiencies were 1 0-200 times lower than those of 
conUDl transformations with the close, covalent and circular pUC19 plasmid. 

Two clones were selected at random from the £ coli SCS 110 

iransformants and two from £. coli ESI 301 mutS transformanis, and plasmid DNA was 
isolated from these four clones for fiirther DNA sequencing analysis. 
30 B. Results 

Figure 7 shows (a) the sequence of SCSOl {clone#l from £. coli SCSI 10 
transformant library;, (b) the sequence of SCS02 (clone #2 from £ coli SCSI 10 
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iransformant library), (c) the sequence of ESOl (clone #1 from £. coli ESI 301 
transformanl library), and (d) the sequence of ES02 (clone #2 from £. coli ES1301 
iransformant library). All four sequences were different from wild-type R, lupiniflaA and 
R. melilotiflaA sequences. Clones SCS02. ESOI and ES02 all contain a complete open- 
5 reading frame, but SCSOl was truncated. Figure 8 shows that recombination mainly 
occurred in the loop regions (unmatched regions). The floA mutant library generated 
from R. melilotiflaA and /?. lupiniflaA can be transformed into £. coli SCSllO, ESI 301, 
XLIO-Gold and JM109, and transformants screened for fimciional FlaA recombinants. 

^0 EXAMPLE 2> Directed Evolution Of ECB Deacvlase For Variants With EnhancpH 
Specific Activity 

Streptomyces are among the most important industrial microorganisms due 
to their ability to produce numerous important secondary metabolites (including many 
antibiotics) as well as large amounts of enzymes. The approach described here can be 

1 5 used with little modification for directed evolution of native Streptomyces enzymes, 
some or all of the genes in a metabolic pathways, as well as other heterologous enzymes 
expressed in Streptomyces, 

New antifungal agents are critically needed by the large and growing 
numbers of immune-compromised AIDS, organ transplant and cancer chemotherapy 

20 patients who suffer opportunistic infections. Echinocandin B (ECB), a lipopeptide 
produced by some species of Aspergillus, has been studied extensively as a potential 
antifimgal. Various antifungal agents with significantly reduced toxicity have been 
generated by replacing the linoleic acid side chain of i4. nidulans echinocandin B with 
different aryl side chains (79-83). The cyclic hexapeptide ECB nucleus precursor for the 

25 chemical acylation is obtained by enzymatic hydrolysis of ECB using Actinoplanes 

utahensis ECB deacylase. To maximize the conversion of ECB into intact nucleus, this 
reaction is carried out at.pH 5.5 with a small amount.ofmiscibje organic splyent.to _ 
solubilize the ECB substrate. The product cyclic hexapeptide nucleus is unstable at pH 
above 5.5 during the long incubation required to fully deacylate ECB (84). The pH 

30 optimum of ECB deacylase, however, is 8.0-8.5 and its activity is reduced at pH 5.5 and 
in the presence of more than 2.5% ethanol (84). To improve production of ECB nucleus 
it is necessar>' to increase the activity of the ECB deacylase under these process-relevant 
conditions. 
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Relatively little is known about ECB deacylase. The enzyme is a 
heterodimer whose two subuniis are derived by processing of a single precursor protein 
(83). The 19.9 kD a-subunit is separated from the 60.4 kD p-subunit by a 15-amino acid 
spacer peptide that is removed along with a signal peptide and another spacer peptide in 
5 the native organism. The polypeptide is also expressed and processed into functional 
enzyme in Streptomyces lividans, the organism used for large-scale conversion of ECB by 
recombinant ECB deacylase. The three-dimensional strucmre of the enzyme has not been 
determined, and its sequence shows so little similarity to other possibly related enzymes 
such as penicillin acylase that a structural model reliable enough to guide a rational effort 
1 0 to engineer the ECB deacylase will be difficult to build. We therefore decided to use 
directed evolution (85) to improve this important activity. 

Protocols suitable for mutagenic PCR and random-priming recombination 
of the 2.4 kb ECB deacylase gene (73% G+C) have been described recently (86). Here, 
we further describe the use of heteroduplex recombination to generate new ECB 
1 5 deacylase with enhanced specific activity. 

In this case, two Actinoplams utahensis ECB deacylase mutants, M7-2 
and M16, which show higher specific activity at pH 5.5 and in the presence of 10% 
MeOH were recombined using technique of the in vitro heteroduplex formation and in 
vivo mismatch repair . 

20 Figure 1 2 shows the physical maps of plasmids pM7-2 and pM16 which 

contain the genes for the M7-2 and Ml 6 ECB deacylase mutants. Mutant M7-2 was 
obtained through mutagenic PCR performed directly on whole Streptomyces lividans cells 
containing wild-type ECB deacylase gene, expressed fi-om plasmid pSHP150-2*. 
Streptomyces with pM7-2 show 1.5 times the specific activity of cells expressing the 

25 wild-type ECB deacylase (86). Clone pM16 was obtained using die random-priming 

recombination technique as described (86, 87). It shows 2.4 times specific activity of the 
wild-type ECB deacylase clone. - 

A. Methods : 

30 M7-2 and M 1 6 plasmid DN A (pM7-2 and pM 1 6) (Fig. 9) were purified 

from E, coll SCSI 10 (in separate reactions). About 5.0 |ig of unmethylated M7-2 and 
Ml 6 DNA were digested with A77o I and Psh AI. respectively, at 37°C for 1 hour (Fig. 
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10). After agarose gel separation, the linearized DNA was purified using a Wizard PGR 
Prep Kit (Promega, WI. USA). 

Equimolar concentrations (2.0 nM) of the linearized unmethylated pM7-2 and pM16 
DNA were mixed in 1 x SSPE buffer (Ix SSPE: 1 80 mM NaCl, 1.0 mM EDTA, 10 mM 
NaH2P04, pH 7.4). After heating at 96 for 10 minutes, the reaction mixture is 
inunediately cooled at 0 "C for 5 minutes. The mixture was incubated at 68 °C for 3 
hours to promote formation of heteroduplexes. 

One microliter of the reaction mixture was used to transform 50 i^l of 
Kcoli ES1301 mutS, SCSI 10 and JM109 competent cells. All transformants from £ coli 
ES1301 mutS were pooled and E. coli SCSI 10 were pooled. A plasmid pool was isolated 
from each pooled library, and this pool was used to transform S. lividans TK23 
protoplasts to form a mutant library for deacylase activity screening. 
Transformants from the S. lividans TK23 libraries were screened for ECB deacylase 
activity with an in situ plate assay. Transformed protoplasts were allowed to regenerate 
1 5 on R2YE agar plates for 24 hr at 30''C and to develop in the presence of thiostrepton for 
48 hours. When the colonies grew to the proper size, 6 ml of 0.7% agarose solution 
containing 0.5 mg/ml ECB in 0. 1 M sodium acetate buffer (pH 5.5) was poured on top of 
each R2YE-agar plate and allowed to develop for 18-24 hr at 30°C. Colonies surrounded 
by a clearing zone larger than that of a control colony containing wild-type plasmid 
20 pSHPl 50-2*, were selected for ftjrther characterization. 

Selected transformants were inoculated into 20 ml medium containing 
thiostrepton and grown aerobically at 30''C for 48 hours, at which point they were 
analyzed for ECB deacylase activity using HPLC. 100 ^l of whole broth was used for a 
reaction at 30 "C for 30 minutes in 0.1 M NaAc buffer (pH 5.5) containing 10% (v/v) 
25 MeOH and 200 ^g/ml of ECB substrate. The reactions were stopped by adding 2.5 

volumes of methanol, and 20 ^l of each sample were analyzed by HPLC on a 100 x 4.6 
nun polyhydroxyetiiyl aspartamide column (PolyLC Inc., Columbia;-MD. USA) at room 
temperature using a linear acetonitrile gradient starting with 50:50 of A:B (A = 93% 
acetonitrile. 0. 1% phosphoric acid: B = 70% acetonitrile. 0. 1% phosphoric acid) and 
30 ending with 30:70 of A:B in 22 min at a flow rate of 2.2 ml/min. The areas of the ECB 
and ECB nucleus peaks were calculated and subtracted from the areas of the 
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corresponding peaks from a sample culture of S. lividans containing pIJ702* in order to 
estimate the ECB deacylase activity. 

2.0 ml pre-cuitures of positive mutants were used to inoculate 50-ml 
medium and allowed to grow at 30^C for 96 hr. The supematants were further 
5 concentrated to 1/30 their original volume using an Amicon filtration unit (Beverly, MA, 
USA) with molecular weight cutoff of 10 kD. The resulting enzyme samples were 
diluted with an equal volume of 50 mM KH2PO4 (pH 6.0) buffer and were applied to Hi- 
Trap ion exchange column (Pharmacia Biotech, Piscataway, NJ, USA) . The binding 
buffer was 50 mM KH2PO4 (pH 6.0), and the elution buffer was 50 mM KH2PO4 (pH 6.0) 

10 containing 1 .0 M NaCl. A linear gradient from 0 to LO M NaCl was applied in 8 column 
volumes with a flow rate of 2.7 ml/min. The ECB deacylase fraction eluting at 0.3 M 
NaCl was concentrated and the buffer was exchanged for 50 mM KH2PO4 (pH 6.0) using 
Centricon-10 units. Enzyme purity was verified by SDS-PAGE using Coomassie Blue 
stain, and the concentration was determined using the Bio-Rad Protein Assay Reagent 

15 (Hercules, CA, USA). 

A modified HPLC assay was used to determine the activities of the ECB 
deacylase mutants on ECB substrate (84). Four ^g of each purified ECB deacylase 
mutant was used for activity assay reaction at 30X for 30 minutes in 0.1 M NaAc buffer 
(pH 5.5) containing 10% (v/v) MeOH and different concentrations of ECB substrate. 

20 Assays were performed in duplicate. The reactions were stopped by adding 2.5 volumes 
of methanol and the HPLC assays were carried out as described above. The absorbance 
values were recorded, and the initial rates were calculated by least-squares regression of 
the time progress curves from which the Km and the kcat were calculated. 

Activities as a function of pH were measured for the purified ECB 

25 deacylases at 30^C at different pH values: 5, 5.5 and 6 (0.1 M acetate buffer); 7, 7.5, 8 
and 8.5 (0.1 M phosphate buffer); 9 and 10 (0.1 M carbonate buffer) using the HPLC 
assay. Stabilities of purified ECB deacylases were were determined at .3_0°C in O.LM 
NaAc buffer (pH 5.5) containing 10% methanol. Samples were withdrawn at different 
time intervals, and the residual activity was measured in the same buffer with the HPLC 

30 assay described above. 
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B. Results 

Fig. 1 1 shows that after one round of applying this heteroduplex repair 
technique on the mutant M7-2 and M16 genes, one mutant (M15) from about 500 original 
transformants was found to possess 3.1 times the specific activity of wild-type. 
Wild type and evolved Ml 5 ECB deacylases were purified and their kinetic parameters 
for deacylation of ECB were determined by HPLC. The evolved deacylases Ml 5 has an 
increased catalytic rate constant, kcai by 205%. The catalytic efficiency (kcai/Km) of M20 
is enhanced by a factor of 2.9 over the wild-type enzyme. 

Initial rates of deacylation with the wild type and Ml 5 at different pH 
values from 5 to 10 were determined at 200 ng/ml of ECB. The recombined M 15 is 
more active than wild type at pH 5-8. Although the pH dependence of the enzyme 
activity in this assay is not strong, there is a definite shift of 1.0-1.5 units in the optimum 
to lower pH, as compared to wild type. 

The time courses of deactivation of the purified ECB deacylase mutant 
M 1 5 was measured in 0. 1 M NaAc (pH 5.5) at 30°C. No significant difference in 
stability was observed between wild type and mutant Ml 5. 

The DNA mutations with respect to the wild type ECB deacylase sequence 
and the positions of the amino acid substitutions in the evolved variants M7-2, M16 and 
Ml 5 are summarized in Figure 12. 

The heteroduplex recombination technique can recombine parent 
sequences to create novel progeny. Recombination of the M7-2 and M 16 genes yielded 
Ml 5, whose activity is higher than any of its parents (Fid. 13). Of the six base 
substitutions in M15, five (at positions a50, al7K 357, pl29 and p340) were inherited 
from M7-2, and the other one ((330) came from Ml 6. 

This approach provides an alternative to existing methods of DNA 
recombination and is particularly useful in recombining large genes or entire operons. 
This method can be used to create recombinant proteins to improve their properties or to 
study structure-fimciion relationship. 
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EXAMPLE 3. N ovel Thermostable Bacillus Subtilis Subtilisin E Variantf; 

This example demonstrates the use in vitro heteroduplex formation 
followed by in vivo repair for combining sequence information from two different 
sequences in order to improve the thermostability of Bacillus subtilis subtilisin E, 
5 Genes RCl and RC2 encode thermostable B. sublilis subtilisin E variants 

(88). The mutations at base positions 1 107 in RCl and 995 in RC2 (Figure 14), giving 
rise to amino acid substitutions Asn218/Ser (N218S) and Asnl81/Asp (N181 ID), lead to 
improvements in subtilisin E thermostability; the remaining mutations, both synonymous 
and nonsynonymous, have no detectable effects on thermostability. At 65°C, the single 

10 variants N181D and N2 1 8S have approximately 3-fold and 2-fold longer half-lives, 

respectively, than wild subtilisin E, and variants containing both mutations have half-lives 
that are 8-fold longer (88). The different half-lives in a population of subtilisin E variants 
can therefore be used to estimate the efficiency by which sequence information is 
combined. In particular, recombination between these two mutations (in the absence of 

15 point mutations affecting thermostability) should generate a library in which 25% of the 
population exhibits the thermos/ability of the double mutant. Similarly, 25% of the 
population should exhibit wild-type like stability, as N181D and N218S are eliminated at 
equal frequency. We used the fractions of the recombined population as a diagnostic 

20 A. Methods 

The strategy underlying this example is shown in Fig. 15. 
Subtilisin E thermostable mutant genes RCl and RC2 (Fig. 14) are 986-bp 
fragments including 45 nt of subtilisin E prosequence, the entire mature sequence and 1 13 
nt after the stop codon. The genes were cloned between Bam HI and Nde I in £. colilB. 
IS subtilis shuttle vector pBE3. resulting in pBE3-l and pBE3-2, respectively. Plasmid 
DNA pBE3-l and pBE3-2 was isolated from Exoli SCSI 10. 

" About 5.0 ^ig of ununethylated pBE3-l and pBE3-2 DNA were digested 

with Bam HI and Nde L respectively, at 37*'C for 1 hour. After agarose gel separation, 
equimolar concentrations (2.0nM) of the linearized unmethylated pBE3-l and pBE3-2 
30 were mixed in 1 x SSPE buffer ( 1 80 mM NaCL 1 .0 mM EDTA, 1 0 mM NaH2P04. pH 
7.4). After heating at 96°C for 10 minutes, the reaction mixture was immediately cooled 
at O^'C for 5 min. The mixture was incubated at 68**C for 2 hr for heteroduplexes to form. 
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One microliter of the reaction mixture was used to transform 50 nl of £ 
colt ES 1301 mutS. E, coli SCSI 10 and £ coli HBlOl competent cells. 

The transformation efficiency with £. coli HBlOl competent cells was 
about ten times higher than that of Exoli SCSI 10 and 15 times higher than that of £. coli 
5 ESI 301 mutS, But in all these cases, the transformation efficiencies were 10-250 times 
lower than that of the transformation with closed, covalent and circular control pUC19 
plasmids. 

Five clones from £ coli SCSI 10 mutant library and five from £ coli 
ESI 301 mutS library were randomly chosen, and plasmid DNA was isolated using a 
1 0 QIAprep spin plasmid miniprep kit for further DNA sequencing analysis. 

About 2.000 random clones from £ coli HBlOl mutant library were 
pooled and total plasmid DNA was isolated using a QIAGEN-100 column. 0,5-4.0 ug of 
the isolated plasmid was used to transform Bacillus subtilis DB428 as described 
previously (88). 

1 5 About 400 transformants from the Bacillus subtilis DB428 library were 

subjected to screening. Screening was performed using the assay described previously 
(88), on succinyl-AIa-Ala-Pro-Phe-p-nitroanilide. 5. subtilis DB428 containing the 
plasmid library were grown on LB plates containing kanamycin (20 ng/ml) plates. After 
18 hours at 37°C single colonies were picked into 96-well plates containing 200 ^1 

20 SG/kanamycin medium per well. These plates were incubated with shaking at 37°C for 
24 hours to let the cells to grow to saturation. The cells were spun down, and the 
supematants were sampled for the thermostability assay. 

Two replicates of 96-well assay plates were prepared for each growth plate 
by transferring 10 nl of supernatant into the replica plates. The subtilisin activities were 

25 then measured by adding 100 ^l of activity assay solution (0.2 mM succinyl-Ala-Ala-Pro- 
Phe-p-nitroanilide, 100 mM Tris-HCl, 10 mM CaCh, pH 8.0. 37°C). Reaction velocities 
were measured at 405 nm to over 1.0 mih in a ThermoMax microplate reader (Molecular 
Devices. Sunnyvale CA), Activity measured at room temperature was used to calculate 
the fraction of active clones (clones with activity less than 10% of that of wild type were 

30 scored as inactive). Initial activity (Aj) was measured after incubating one assay plate at 
65°C for 10 minutes by immediately adding 100 ^l of prewarmed (37°C) assay solution 
(0.2 mM succinyl-Ala-Ala-Pro-Phe-p-nitroanilide. 100 mM Tris-HCL pH 8.0. 10 mM 
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CaCb, pH 8.0) into each well. Residual activity (Ar) was measured after 40 minute 
incubation. 

B. Results 

In vitro heteroduplex formation and in vivo repair was carried out as 
5 described above. Five clones from £ coli SCSI 10 mutant library and five from E. coli 
ES 1 30 1 mutS libraries were selected at random and sequenced. Fig. 1 4 shows that four 
out of the ten clones were different from the parent genes. The frequency of occurrence 
of a particular point mutation from parent RCl or RC2 in the resulting genes ranged from 
0% to 50%, and the ten point mutations in the heteroduplex have been repaired without 

10 strong strand-specific preference. 

Since none of the ten mutations locates within the dcm site, the mismatch 
repair appears generally done via the £. co/i long-patch mismatch repair systems. The 
system repairs different mismatches in a strand-specific manner using the state of N6- 
methylation of adenine in GATC sequences as the major mechanism for determining tiie 

1 5 strand to be repaired. With heteroduplexes metiiylated at GATC sequences on only one 
DNA strand, repair was shown to be highly biased to the unmetiiylated strand, with tiie 
metiiylated strand serving as the template for correction. If neitiier strand was 
methylated, mismatch repair occurred, but showed little strand preference (23, 24). These 
results shows that it is preferable to demetiiylate tite DNA to be recombined to promote 

20 efficient and random repair of the heteroduplexes. 

The rates of subtilisin E thermo-inactivation at 65°C were estimated by 
analyzing the 400 random clones from tiie Bacillus subtilis DB428 library. The 
tiiermostabilities obtained from one 96-well plate are shown in Figure 16, plotted in 
descending order. About 12.9% of tiie clones exhibited tiiermostability comparable to the 

25 mutant with the N181D and N2I8S double mutations. Since tiiis rate is only half of tiiat 
expected for random recombination of these two maricers, it indicates tiiat tiie two 
mismatches at positions 995 and .1 107 witiiin tiie.heteroduplexes have been repaired with 
lower position randomness. 

Sequence analysis of the clone exhibiting the highest thermostability 

30 among the screened 400 transformants trom the £ coli SCSI 10 heteroduplex library 
confirmed the presence of both N 1 8 1 D and N2 1 8S mutations. Among the 400 
transformants from tiie B.sublilis DB428 library tiiat were screened, approximately 91% 
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of the clones expressed N 1 81D- and/or N218S-type enzyme stabilities, while about 8.0% 
of the transformants showed only wild-type subtilisin E stability. 

Less than 1 .0% inactive clone was found, indicating that few new point 
mutations were introduced in the recombination process. This is consistent with the fact 
5 that no new point mutations were identified in the ten sequenced genes (Figure 14). 
While point mutations may provide useful diversity for some in vitro evolution 
applications, they can also be problematic for recombination of beneficial mutations, 
especially when the mutation rate is high. 

10 EXAMPLE 4. Optimizi ng Conditions For The Heterodunlex Recombination. 

We have found that the efficiency of heterodupiex recombination can 
differ considerably from gene to gene [1 7,57]. In this example, we investigate and 
optimize a variety of parameters that improve recombination efficiency. 
DNA substrates used in this example were site-directed mutants of green fluorescent 

1 5 protein from Aequorea victoria. The GFP mutants had a stop codon(s) introduced at 

different locations along the sequence that abolished their fluorescence. Fluorescent wild 
type protein could be only restored by recombination between two or more mutations. 
Fraction of fluorescent colonies was used as a measure of recombination efficiency. 

20 A. Methods 

About 2-4 ^ig of each parent plasmid was used in one recombination 
experiment. One parent plasmid was digested with Pst I endonuclease another parent 
with £coRI. Linearized plasmids were mixed together and 20 x SSPE buffer was added to 
the final concentration Ix (180 mM NaCl, 1 mM EDTA, 10 mM NaH2P04, pH 7.4). The 

25 reaction mixture was heated at 96°C for 4 minutes, immediately transferred on ice for 4 
minutes and the incubation was continued for 2 hours at 68**C, 

Target genes were amplified in a PCR reaction with primers corresponding 
to the vector sequence of pGFP plasmid. Forward primer: 5'- 

CCGACTGGAAAGCGGGCAGTG-3\ reverse primer 5'- 

30 CGGGGCTGGCTTAACTATGCGGO\ PCR products were mixed together and purified 

using Qiagen PCR purification kit. Purified products were mixed with 20 x SSPE buffer 

and hybridized as described above, .\nnealed products were precipitated with ethanol or 
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purified on Qiagen columns and digested with EcoRl and Pstl enzymes. Digested 
products were ligated into Psil and EcoRi digested pGFP vector. 

dUTP was added into PGR reaction at final concentrations 200 jxM, 40fiM, 
8 jiM, 1.6 ^iM, 0.32 jiM. PGR reaction and subsequent cloning procedures were 
performed as described above. 

Recombinant plasmids were transformed into XL 10 £. coli strain by a 
modified chemical transformation method. Gells were plated on ampicillin containing LB 
agar plates and grovm overnight at 37X, followed by incubation at room temperature or 
at 4°G until fluorescence developed. 

B. Results . 

1 . Effect of ligation on recombination efficiencv . 

Two experiments have been performed to test the effect of breaks in the 
DNA heteroduplex on the efficiency of recombination. In one experiment heteroduplex 
plasmid was treated with DNA ligase to close all existing single-strand breaks and was 
transformed in identical conditions as an unligated sample (see Table 1). The ligated 
samples show up to 7-fold improvement in recombination efficiency over unligated 
samples. 

In another experiment, dUTP was added into PGR reaction to introduce 
additional breaks into DNA upon repair by uracyl N-glycosylase in the host cells. Table 
2 shows that dUMP incorporation significantly suppressed recombination, the extent of 
suppression increasing with increased dUTP concentration. 

2. Effect of plasmid size on the efficiencv of 
heteroduplex formation . 

Plasmid size was a significant factor affecting recombination efficiency. 
Two plasmids pGFP (3.3 kb) and a shuttle vector pGTl-(about 9 kb).were used . 

in preparing circular heteroduplex-like plasmids following traditional heteroduplex 
protocol. For the purpose of this experiment (to study the effect of plasmid size on 
duplex formation), both parents had the same sequences. While pGFP formed about 30- 
40% of circular plasmid. the shuttle vector yielded less than 10% of this form. 
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Increase in plasmid size decreases concentration of the ends in the vicinity 
of each and makes annealing of very long (>0.8 kb) ends that are singie-stranded more 
diflficuh. This difficuhy is avoided by the procedure shown in Fig. 3, in which 
heterodupiex formation occurs between substrates in vector-free form, and, 
5 heteroduplexes are subsequently inserted into a vector. 

3. Efficiencv of Reco mbination vs. Distance Between Mutation?; 

A series of GFP variants was recombined pairwise to study the effect of 
distance between mutations on the efficiency of recombination. Parental genes were 
10 amplified by PCR, annealed and ligated back into pGFP vector. Heteroduplexes were 
transformed into XL 10 Exoli strain. 

The first three columns in Table 3 show the results of three independent 
experiments and demonstrate the dependence of recombination efficiency on the distance 
between mutations. As expected recombination becomes less and less efficient for very 
15 close mutations. 

However, it is still remarkable that long-patch repair has been able to 
recombine mutations separated by only 27 bp. 

The last line in Table 3 represents recombination between one single and 
one double mutants. Wild type GFP could only be restored in the event of double 
20 crossover with each individual crossover occurring in the distance of 99 bp only, 

demonsu^ting the ability of this method to recombine multiple, closely-spaced mutations. 

4. Elimination Of The Parental Double Strands 
From Heterodupiex Preparations. 

2^ Annealing of substrates in vector-free form offers size-advantages relative 

to annealing of substrates as components of vectors, but does not allow selection for 

heteroduplexes.relative to homoduplexes.simply by transformation inlo.host,. 

Asymmetric PCR reactions with only one primer for each parent seeded with appropriate 
amount of previously amplified and purified gene fragment were run for 100 cycles, 
ensuring a 100-fold excess of one strand over another. Products of these asymmetrical 
reactions were mixed and annealed together producing only a minor amount of 
nonrecombinant duplexes. The last column in Table 3 shows the recombination 
efficiency obtained from these enriched heteroduplexes. Comparison of the first three 



30 
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columns with the fourth one demonstrates the improvement achieved by asymmetric 
synthesis of the parental strands. 

While the foregoing invention has been described in some detail for 
purposes of clarity and understanding, it will be clear to one skilled in the art from a 
reading of this disclosure that various changes in form and detail can be made without 
departing from the true scope of the invention. All publications and patent documents 
cited in this application are incorporated by reference in their entirety for all purposes to 
the same extent as if each individual publication or patent document were so individually 
denoted. 
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WHAT IS CLATMFn JR- 

I LA method for evolving a polynucleotide toward acquisition of a 

desired property, comprising 

(a) incubating a population of parental polynucleotide variants under 
4 conditions to generate annealed polynucleotides comprising heteroduplexes: 

(b) exposing the heteroduplexes to a cellular DNA repair system to 

6 convert the heteroduplexes to parental polynucleotide variants or recombined 

7 polynucleotide variants; 

8 (c) screening or selecting the recombined polynucleotide variants for the 

9 desired property. 

1 2. The method of claim 1 , wherein the heteroduplexes are exposed to 

2 the cellular DNA repair system in vitro, 

1 3. The method of claim 2, wherein the cellular DNA repair system 

2 comprises cellular extracts. 

1 4. The method of claim 1 , further comprising introducing the 

2 heteroduplexes into cells, whereby the heteroduplexes are exposed to the DNA repair 

3 system of the cells in vivo. 

1 5. The method of claim 4, wherein the annealed polynucleotides 

2 further comprise homoduplexes and the introducing step selects for transformed cells 

3 comprising the heteroduplexes relative to transfonned cells comprising homoduplexes. 

1 6. The method of claim 4, wherein a first polynucleotide variant is 

2 provided as a component of a first vector, and a second polynucleotide variant is provided 

3 as a component of a second vector, and the method further comprises converting the first 

4 and second vectors to linearized forms in which the first and second polynucleotide 

5 variants occur at opposite ends, whereby in the incubating step single-stranded forms of 

6 the first linearized vector reanneal with each other to form linear first vector, single- 

7 stranded forms of the second linearized vector reanneal with each other to form linear 

8 second vector, and single-stranded linearized forms of the first and second vectors anneal 

9 with each to form a circular heteroduplex bearing a nick in each strand, and the 
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5 



introducing step selects for transformed cells comprising the circular 
heterodupiexes relative to the linear first and second vector. 



• 7. The method of claim 6, wherein the first and second vectors are 



2 converted to linearized forms by PCR. 

1 8. The method of claim 6, wherein the first and second vectors are 

2 converted to linearized forms by digestion v^dth first and second restriction enzymes. 

1 9. The method of claim U wherein the population of polynucleotide 

2 variants are provided in double stranded form, and the method further comprising 

3 converting the double stranded polynucleotides to single stranded polynucleotides before 

4 the annealing step. 

1 1 0. The method of claim 1 , wherein the converting step comprises: 

2 conducting asymmetric amplification of the first and second double 

3 stranded polynucleotide variants to amplify a first strand of the first polynucleotide 

4 variant, and a second strand of the second polynucleotide variant, whereby the first and 
5 



second strands anneal in the incubating step to form a heteroduplex. 



1 11 . The method of claim 1 0, wherein the first and second double- 

2 stranded polyncueltoides variants are provided in vector-free form, and the method 

3 further comprises incorporating the heteroduplex into a vector. 

1 1 2. The method of claim 4 wherein the population of polynucleotides 

2 comprises first and second polynucleotides provided in double stranded form, and the 

3 method fimher comprises incorporating die first and second polynucleotides as 

4 components of first and second vectors, whereby the first and second polynucleotides 
occupy opposite-ends of. the first and second vectors, whereby in Ae incubating step 

6 single-stranded forms of the first linearized vector reanneal with each other to form linear 

7 first vector, single-stranded forms of the second linearized vector reanneal with each 

8 other to form linear second vector, and single-stranded linearized forms of the first and 

9 second vectors anneal with each to form a circular heteroduplex bearing a nick in each 
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1 strand, and the introducing step selects for transformed cells comprises the 

2 circular heteroduplexes relative to the linear first and second vector. 

1 13. The method of claim 4, further comprising sealing nicks in the 

2 heteroduplexes to form covalently-closed circular heteroduplexes before die introducing 

3 step. 

1 14. The method of claim 1 1 , wherein the first and second 

2 polynucleotides are obained from chromosomal DNA.. 

' 15. The method of claim 1 , further comprising repeating steps (a)-(c) 

2 whereby the incubating step in a subsequent cycle is performed on recombinant variants 

3 from a previous cycle. 

1 16. The method of claim 1, wherein the polynucleotide variants encode 

2 a polypeptide. 

1 17. The method of claim 1 , wherein the population of polynucleotide 

2 variants comprises at least 20 variants. 

1 1 8. The method of claim 1 , wherein the population of polynucleotide 

2 variants are at least 1 0 kb in length. 

1 1 9. The method of claim 1 , wherein the population of polynucleotide 

2 variants comprises natural variants. 

1 20. The method of claim 1 , wherein the population of polynucleotides 

2 comprises variants generated by mutagenic PCR. 

' . . 21 The method of claim 1 , wherein the population of polynucleotide 

2 variants comrises variants generated by site directed mutagenesis. 

1 22. The method of claim 1 . wherein the cells are bacterial cells. 

1 23. The method of claim 1 , further comprising at least partially 

2 demethylating the population of variant polynucleotides. 
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24. The method of claim 23, whether the at least partially 
demeihylating step is performed by PGR amplification of the population of variant 
polynucleotides. 



1 25. The method of claim 23, wherein the at least partially 

2 demethylating step is performed by amplification ofthe population of variant 

3 polynucelotides in host cells. 

1 26. The method of claim 25, wherein the host cells are defective in a 

2 gene encoding a methylase enzyme. 

1 27. The method of claim 1 , wherein the population of variant 

2 polynucleotide variants comprises at least 5 polynucleotides having at least 90% sequence 

3 identity with one another. 

1 28. The method of claim 1 , further comprising isolating a screened 

2 recombinant variant. 

1 29. The method of claim 28, further comprising expressing a screened 

2 recombinant variant to produce a recombinant protein. 

1 30. The method of claim 29 fiirther comprising formulating the 

2 recombinant protein with a carrier to form a pharmaceutical composition. 

1 31. The method of claim 1 , wherein the polynucleotide variants encode 

2 enzymes selected from the group consisting of proteases, lipases, amylases, cutinases, 

3 cellulases, amylases, oxidases, peroxidases and phytases. 

1 32. The method of claim 1 , wherein the polynucleotide variants encode 

2 a polypeptide selected from the group consisting df insulin. ACTH. glucagon: 

3 somatostatin, somatotropin, thymosin, parathyroid hormone, pigmentary hormones. 

4 somatomedin, erthropoietin. luteinizing hormone, chorionic gonadotropin, hyperthalmic 

5 releasing factors, antidiuretic hormones, thyroid stimulating hormone, relaxin. interferon. 

6 thrombopoietin fTPO), and prolactin. 
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1 33 . The method of claim 1 , wherein the polynucleotide variants 

2 encode a plurality of enzymes forming a metabolic pathway. 

1 34. The method of claim 1 , wherein the polynucleotide variants are in 

2 concatemeric form. 



wo 99/29902 



1 / 21 



PCT/US98/25698 




wo 99/29902 



2 / 21 



PCT/US98y25698 




wo 99/29902 



3 / 21 



PCT/US98/25698 




^ Digestion 
^Ligation 




^ Transformation 
TRANSrORMANTS 



Fig. 3 



wo 99/29902 



4 / 21 



PCT/US98/25698 




vector 



R:III«fIH«t;HI.M:t:fllt:«;FM^ 




Restriction (A, B, X and Y) 
Ligation 



vector 



Transformation 



TRANSFORMANTS 



Fig. 4 



wo 99/29902 



/ 21 



PCT/US98/25698 




wo 99/29902 



6 / 21 



PCT/US98/25698 




wo 99/29902 



7 / 21 



PCT/US98/25698 



o H <i 

u < 3 

O < C5 

< O O 

o o u 

U H < 

fr> < U 



< U 




BBS 8 ^ 81 y 

o c$ < o < u 



: o 8 H 



5^8 



u u u 




Sou 
o o o 
o < a 
u 2 u 
p o < 

e« < 6-» 
u o < 

< u o 

< u o < 

< o o i9 

8 83 " 

< o u 
u t- o 

H 6^ O 
O O O ^ 
H O U CJ O 

u o < o ^ ^ 

u u o u CP 
a u o o a < 

< o o o o < . 

< O U 

y < 



o o 

O H o o 

^58& 



8 



30 < < 
o < 



CJ u c 

o o <^ 
u o 



?< y ?? 



8 



8 



83 
5^8 



u u < 
< o o 

m 



CQ 



wo 99/29902 



8 / 21 



PCTAJS98/25698 




o o o 



I* o < < a 

' H < O U U O 




s 



8 



O Eh 



I w 



O U < 

§ 



81 



U U 



< < I 

o a ^ 

Ii8i 



o < u u a 

H c g < H 

C3 u o o cj 

< u u o u 

8l:S:S8 

p < < o 
o o u t« o 

H ^ H 8 H 



§11 

^58 



68858 
^88^5 



u o o o 

f- H OP o 



38 



88 



8 



S 8 8 S 

O 6« O U 

<8gS 

C3 < p 
U O O H 

:58i 

o o u 



8^888 
2S^88 



V w 

o o 
H y 

883^5 
58^g ^ 

U < < O O 
a O H < 
O U < Q O 

o < 3 Q a 
< o o ^ ^ 



5 



^8^1^ 
^Ii85 
:8^i 



|8g,. 

O U Q 5 o 

88S^5J 
8li^ 



8 



I. 



18:5 
:S8 



a 

U 



/- -> ■ ' - ' ' - — ' -» — I »^ 











<N CM 


o 


o 














y^ 


y^ 






eg 







wo 99/29902 



9 / 21 



PCT/US98/25698 




wo 99/29902 



10 / 21 



PCT/US98/25698 




wo 99/29902 



11 / 21 



PCTAJS98/25698 




wo 99/29902 



12 / 21 



PCT/US98/25698 




Fig. 11 



wo 99/29902 



13 / 21 



PCT/US98/25698 



0-WT 




1-M7-2 



a50 


GTG -►GTA 


al71 


CCG -#-CCT 


EP57G 




V|5129M 


G-^A 


VP340A 


T-^C 



1-M16 



SPI-3 


GCC ACC 


al48 


ACG -^ACA 


Sal49T 




Ep30A 


A-^C 


vpi66A 




P233 


CGG AGG 


(3445 


ATC -^ATT 




Heteroduplex recombination 



M15 



050 GTG -►GTA 
al71 CCG -►CCT 
Ep30A A -^C 
Ep57G A -^G 
Vpi29M . G-^A 
Vp340M GTC-^GCC 



Fig. 12 



wo 99/29902 



14 / 21 



PCT/US98A25698 



1 CTGCAGCGTGCCCAGCTGTTCGTGGTGGTGATCGCGGCCGCGCTGGCCGCCGTCGCGGTC 
61 GCCGCCGCCGGGCCGATCGAGTTCGTCGCCTTCGTCGTGCCGCAGATCGCCCTGCGGCTC 
121 TGCGGCGGCAGCCGGCCGCCCCTGCTCGCCTCGGCGATGCTCGGCGC6CTGCTCGTGGTC 
1 S 1 GGCGCCGACCTGCTCGCTCAGATCGTGGTGGCGCCGAAGGAGCTGCCGGTCGGCCTGCTC 
241 ACCGCGATCATCGGCACCCCGTACCTGCTCTGGCTCCTGCTTCGGCGATCAAGAAAGGTG 
3 01 AGCGGATGAACGCCCGCCTGCGTGGCGAGGGCCTGCACCTCGCGTACGGGGACCTGACCG 
361 TGATCGACGGCCTCGACGTCGACGTGCACGACGGGCTGGTCACCACCATCATCGGGCCCA 
421 ACGGGTGCGGCAAGTCGACGCTGCTCAAGGCGCTCGGCCGGCTGCTGCGCCCGACCGGCG 
481 GGCAGGTGCTGCTGGACGGCCGCCGCATCGACCGGACCCCCACCCGTGACGTGGCCCGGG 
541 TGCTCGGCGTGCTGCCGCAGTCGCCCACCGCGCCCGAAGGCCTCACCGTCGCCGACCTGG 
601 TGATGCGCGGCCGGCACCCGCACCACACCTGGTTCCGGCAGTGGTCGCGCGACGACGAGG 
661 ACCAGGTCGCCGACGCGCTGCGCTGGACCG ACATGCTGGCGTACGCGGACCGCCCGGTGG 
721 ACGCCCTCTCCGGCGGTCAGCGCCAGCGCGCCTGGATCAGCATGGCGCTGGCCCAGGGCA 
781 CCGACCTGCTGCTGCTGGACGAGCCGACCACCTTCCTCGACCTGGCCCACCAGATCGACG 
841 TGCTGGACCTGCTCCGCCGGCTGCACGCCGAGATGGGCCGG ACCGTGGTG ATGGTGCTGC 
901 ACGACCTGAGCCTGGCCGCCCGGrACGCCGACCGGCTGATCGCGATGAAGGACGGCCGGA 
961 TCGTGGCGAGCCGGCCGCCGGACGAGGTGCTCACCCCGGCGCTCCTGGACTCGGTCTTCG 
1021 QGCTGCGCGCGATGGTGGTGCCCCACCCGGCGACCGGCACCCCGCTGGTGATCCCCCTGC 
1 0 e 1 CGCGCCCCGCCACCTCGGTGCGGGCCTGAAATCGATGAGCGTGGTTGCTTC ATCG6CCTG 
1141 CCGAGCGATGAGAGTATGTGGGCGGTAGAGCGAGTCTCGAGGGGGAGATGCCGCCOTGAC 

V T 

1201 GTCCTCGTACATGCGCCTGAAAGCAGCAGCGATCGCCTTOXSTGTGATCGTGGC^^ 
3 SSYMRLKAAAIAFGVIVATA 

1261 AGCCGTGCCGTCACCCGCTTCCGGCAGGGAAC ATGACGGCGOCTATGCCGCCCTGATCCG 
23 AVPSPASGREHDGGYAALIR 

13 21 CCGGGCCTCGTACGGCGTCCCCCACATCACCGCCGACGACTTCGGGAGCCTCGGTrrCGG 
43 R "a S * Y G V P K I T "A D D F G S L G F" G 

1301 CCTCCOGTACGTGCAGGCCGAGGACAACATCTGCGTCATCGCCGACAGCCTAGTGACGGC 
63 VGYVQAEDNICVIAESVVTA 

1441 C AACGGTGACCGGTCGCGGTGGTTCGGTGCGACCGGGCCGGACGACGCCGATGTGCGCAG 
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S3 NGERSRWFGATGPDOADVRS 

1501 CGACCTCTTCCACCGCAAGGCGATCGACGACCGCGTCGCCGAGCGGCTCCTCGAAGGGCC 
103 DLPHRKAIDDRVAERLLEGP 

1561 CCGCGACGGCGTGCGGGCGCCGTCGCACGACGTCCGGGACCAC3ATGCGCGCCTTCGTCGC 
123 RDGVRAPSDDVRDQMRGFVA 

1621 CGGCTACAACCACTTCCTACGCCGC ACCGGCGTGCACCGCCTGACCGACCCGGCGTGCCG 
U3 GYNHFLRRTGVHRLTDPACR 

1681 CGGCAAGGCCTGGGTGCGCCCGCTCTCCGACATCGATCTCTGGCGTACGTCGTQGGAC^ 
163 GKAWVRPLSEIDllWRTSWDS 

1741 CATGGTCCGGGCCGGTTCCGGGGCGCTGCTCG ACGGCATCGTCGCCGCGACGCCACCTAC 
193 MVRAGSGALLDGIVAATPPT 

1801 AGCCGCCGGGCCCGCGTCAGCCCCGCACCCACCCGACGCCGCCGCGATCGCCGCCGCCCT 
203 AAGPASAPEAPDAAAIAAAL 

1861 CGACGGGACGAGCGCGGGCATCGGCAGCAACGCGTACGGCCTCGGCGCGCAGGCCACCGT 
223 DGTSAGIGSNAYGLGAQATV 

1921 GAACGGCAGCGGGATGGTGCTGGCCAACCCGCACTTCCCGTGGCAGGGCGCCGCACGCTT 
243 NGSGKVbANPHFPWQGAARF 

1981 CTACCGGA7GCACCTCAAGGTGCCCGGCCGCTACGACGTCGAGGGCGCGGCGCTGATCGG 
263 VRMHLKVPGRYDVEGAALIG 

2041 CGACCCGATCATCGGGATCGGGCACAACCGCACGGTCGCCTGGAGCCACACCGTCTCCAC 
283 DPIIGIGHNRTVAWSHTVST 

2 1 D 1 CGCCCGCCGGTTCGTGTGGC ACCGCCTGAGCCTCGTGCCCGGCGACCCCACCTCCTATTA 
303 ARRFVWHRLSLVPGDPTSYY 
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2161 CGTCCACGGCCGCCCCGAGCGGATGCGCGCCCGCACGGTCACGGTCCAGACCGGCAOCCC 
323 VDGRPERMRARTVTVQTGSG 

2221 CCCGGTCAGCCGCACCTTCCACGACACCCGCTACGGCCCCGTGGCCGTGATGCCGGGCAC 
343 PVSRTFHDTRYGPVAVMPGT 

2281 CTTCGACTGGACGCCGGCC ACCGCGTACGCCATCACCGACGTC AACCCGGGCAACAACCG 
363 FDWTPATAYAITDVNAGNNR 

2341 CGCCTTCGACGGGT6GCTGCGGATGGGCCAGGCCAACGACGTCCGGGCGCTCAAGGCGGT 
363 AFDGWLRKGQAK''*DVRALKAV 

2401 CCTCGACCGGCACCAGTTCCTGCCCTGGGTCAACGTGATCGCCGCCGACGCGCCCGGC^^ 
403 LDRHQFLPWVNVIAADARGE 

2461 GGCCCTCTACGGCGATCATTCGGTCGTCCCCCGGGTGACCGGCGCGCTCGCTGCCGCCTG 
423 ALYGDHSVVPRVTGALAAAC 

2521 CATCCCGGCGCCGTTCCAGCCGCTCTACGCCTCCAGCGGCCAGGCGGTCCTGGACGGTTC 
443 IPAPFQPLYASSGQAVLDOS 

2581 CCGGTCGGACTGCGCGCTCG3CGCCGACCCCGACGCCCCGGTCCCGGGCATTCTCGGCCC 
463 RSDCALGADPDAAVPGILGP 

2641 GGCGAGCCTGCCGGTGCGGTTCCGCGACGACTACGTCACaUCTCCAACGAaWSTCACTG 
483 AStiPVRFRDDYVTMSNDSHW 

2701 GCTGGCCAGCCCGGCCGCCCCGCTGGAACGCTTCCCGCGGATCCTCGGCAACGAACGCAC 
503 LAS PAAPLEGFPRILGNERT 

2761 CCCCCGCAGCCTGCGCACCCGGCTCGCGCTGGACCAGATCCAGCAGCGCCTCGCCGGCAC 
S23 PRSLRTRLGLDQIQQRLAGT 

2821 GGACGGTCTGCCCGGCAAGGGCTTC ACCACCGCCCGGCTCTGGCAGGTCATGTTCGGCAA 
543 DGLPGKGFTTARLWQVKFGN 
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2881 CCGGATGCACGGCGCCGAACTCGCCCGCGACGACCTGGTCGCCCTCTGCCGCCGCCAGCC 
5€3 RMHGAELARDDLVALCRRQP 

2941 GACCGCGACCGCCTCGAACGGCGCGATCGTCGACCTCACCGCCCCCTCCACGGCGCTGTC 
533 TATASNGAIVDLTAACTALS 

3001 CCGCTTCGATGAGCGTGCCGACCTGGACAGCCGGGGCGCGCACCTGTTCACCCACTTCGC 
SOS RFDERADLDSRG^AHLFTEFA 

3061 CCTCGCGGGCGGAATCAGGTTCGCCGACACCTTCGAiSGTGACCGATCCGGTACGCACCCC 
623 LAGG I R FADT FE'VT D PVRT P 

3121 gcgccgtctgaacaccacggatccgcccgtacggacgocgctcgccgacgccgtgcaacc 
643 rrlnttdp rvrtaladavqr 

3181 GCTCGCCGGCATCCCCCTCGACGCGAAGCTGGGAGACATCCACACCGACAGCCGCGGCGA 
663 LAGIPLDAKLGDIHTDSRGE 

3241 ACGGCGCATCCCCATCCACGGTGGCCGCGGGGAAGCAGGCACCTTCyu^CGTGATCACCAA 
6B3 RRIPIHGGRGEAGTPNVITK 

3301 CCCGCTCGTGCCGGGCGTGGGATACCCGCAGGTCGTCCACGGAACATCGTTCGTGATGGC 
703 PLVPGVGYPCVVHGTSFVMA 

3361 CGTCGAACTCGGCCCGCACGGCCCGTCGGGACGGCAGATCCTCACCTATGCGCAGTCGAC 
723 VELGPHGPSGRQILTVAQST 

3421 GAACCCGAACXCACCCTGGTACGCCGACCAGACCGTGCTCTACTCGCGGAAGGGCTGGGA 
743 NPNSPWYADQTVLYS R KG WD 

3481 CACC ATCAAGTACACCC AGGCGCAGATCGCGGCCGACCCGAACCTGCGCGTCTACCCGGT 
763 TIKYTBAQIAADPNLRVVRV 

3541 GGC AC AGCGGGCACGCTGACCCACGTC ACGCCGGCTCCCCCCGTGCGGGGGCGCAGGGCO 
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3601 CCGATCGTCTCTGCATCGCCGGTCAGCCGGGGCCTGCGTCGACCC5GCGGCGGCCGGTCGA 

3661 CGCCCGCGTCCCGGCGCAGCGACTGGCTGAAGCGCCAGGCGTCGGCGGCCCGGGGCAGGT 

3721 TGTTGAACATCACGTACGCCGGGCCGCCGTCGAGGATGCCGGCGAGGTGTCCCAGCTCGG 

3781 CATCCGTGTACAC ATGCCGGGCGCCGGTGATGCCGTCCAGCCGGTAATAGGCC ATCGGCG 

3841 TC AGACTGCGGCGCAGGAACGGGTCGGCGGCGTGCCTCAG6TCCAGCTCCTGGCACAAGC 

3901 CCTCGACCACCTCGTCCGGCCACGGGCCGCGCGGCTCCCACAACAGCCCGACACCGGCCG 
3961 

4021 AACTCGCCGGGCACTGCAG 
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